polynomial growth
Integral Signatures of Activation Functions: A 9-Dimensional Taxonomy and Stability Theory for Deep Learning
Mali, Ankur, Hall, Lawrence, Williams, Jake, Richards, Gordon
Activation functions govern the expressivity and stability of neural networks, yet existing comparisons remain largely heuristic. We propose a rigorous framework for their classification via a nine-dimensional integral signature S_sigma(phi), combining Gaussian propagation statistics (m1, g1, g2, m2, eta), asymptotic slopes (alpha_plus, alpha_minus), and regularity measures (TV(phi'), C(phi)). This taxonomy establishes well-posedness, affine reparameterization laws with bias, and closure under bounded slope variation. Dynamical analysis yields Lyapunov theorems with explicit descent constants and identifies variance stability regions through (m2', g2). From a kernel perspective, we derive dimension-free Hessian bounds and connect smoothness to bounded variation of phi'. Applying the framework, we classify eight standard activations (ReLU, leaky-ReLU, tanh, sigmoid, Swish, GELU, Mish, TeLU), proving sharp distinctions between saturating, linear-growth, and smooth families. Numerical Gauss-Hermite and Monte Carlo validation confirms theoretical predictions. Our framework provides principled design guidance, moving activation choice from trial-and-error to provable stability and kernel conditioning.
- North America > United States > Florida > Hillsborough County > Tampa (0.14)
- Africa > Mali (0.04)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)
Generalization Through Growth: Hidden Dynamics Controls Depth Dependence
Sonoda, Sho, Hashimoto, Yuka, Ishikawa, Isao, Ikeda, Masahiro
Recent theory has reduced the depth dependence of generalization bounds from exponential to polynomial and even depth-independent rates, yet these results remain tied to specific architectures and Euclidean inputs. We present a unified framework for arbitrary \blue{pseudo-metric} spaces in which a depth-\(k\) network is the composition of continuous hidden maps \(f:\mathcal{X}\to \mathcal{X}\) and an output map \(h:\mathcal{X}\to \mathbb{R}\). The resulting bound $O(\sqrt{(α+ \log β(k))/n})$ isolates the sole depth contribution in \(β(k)\), the word-ball growth of the semigroup generated by the hidden layers. By Gromov's theorem polynomial (resp. exponential) growth corresponds to virtually nilpotent (resp. expanding) dynamics, revealing a geometric dichotomy behind existing $O(\sqrt{k})$ (sublinear depth) and $\tilde{O}(1)$ (depth-independent) rates. We further provide covering-number estimates showing that expanding dynamics yield an exponential parameter saving via compositional expressivity. Our results decouple specification from implementation, offering architecture-agnostic and dynamical-systems-aware guarantees applicable to modern deep-learning paradigms such as test-time inference and diffusion models.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Japan > Honshū > Kansai > Osaka Prefecture > Osaka (0.04)
- Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
- (2 more...)
Full error analysis of policy gradient learning algorithms for exploratory linear quadratic mean-field control problem in continuous time with common noise
Frikha, Noufel, Pham, Huyên, Song, Xuanye
We consider reinforcement learning (RL) methods for finding optimal policies in linear quadratic (LQ) mean field control (MFC) problems over an infinite horizon in continuous time, with common noise and entropy regularization. We study policy gradient (PG) learning and first demonstrate convergence in a model-based setting by establishing a suitable gradient domination condition.Next, our main contribution is a comprehensive error analysis, where we prove the global linear convergence and sample complexity of the PG algorithm with two-point gradient estimates in a model-free setting with unknown parameters. In this setting, the parameterized optimal policies are learned from samples of the states and population distribution.Finally, we provide numerical evidence supporting the convergence of our implemented algorithms.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Asia > China > Hong Kong (0.04)
Reinforcement Learning for Jump-Diffusions
Gao, Xuefeng, Li, Lingfei, Zhou, Xun Yu
We study continuous-time reinforcement learning (RL) for stochastic control in which system dynamics are governed by jump-diffusion processes. We formulate an entropy-regularized exploratory control problem with stochastic policies to capture the exploration--exploitation balance essential for RL. Unlike the pure diffusion case initially studied by Wang et al. (2020), the derivation of the exploratory dynamics under jump-diffusions calls for a careful formulation of the jump part. Through a theoretical analysis, we find that one can simply use the same policy evaluation and q-learning algorithms in Jia and Zhou (2022a, 2023), originally developed for controlled diffusions, without needing to check a priori whether the underlying data come from a pure diffusion or a jump-diffusion. However, we show that the presence of jumps ought to affect parameterizations of actors and critics in general. Finally, we investigate as an application the mean-variance portfolio selection problem with stock price modelled as a jump-diffusion, and show that both RL algorithms and parameterizations are invariant with respect to jumps.
- Banking & Finance > Trading (1.00)
- Energy > Oil & Gas > Upstream (0.48)
Wide Deep Neural Networks with Gaussian Weights are Very Close to Gaussian Processes
We establish novel rates for the Gaussian approximation of random deep neural networks with Gaussian parameters (weights and biases) and Lipschitz activation functions, in the wide limit. Our bounds apply for the joint output of a network evaluated any finite input set, provided a certain non-degeneracy condition of the infinite-width covariances holds. We demonstrate that the distance between the network output and the corresponding Gaussian approximation scales inversely with the width of the network, exhibiting faster convergence than the naive heuristic suggested by the central limit theorem. We also apply our bounds to obtain theoretical approximations for the exact Bayesian posterior distribution of the network, when the likelihood is a bounded Lipschitz function of the network output evaluated on a (finite) training set. This includes popular cases such as the Gaussian likelihood, i.e. exponential of minus the mean squared error.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Italy > Tuscany > Pisa Province > Pisa (0.04)
Drift Control of High-Dimensional RBM: A Computational Method Based on Neural Networks
Ata, Baris, Harrison, J. Michael, Si, Nian
Motivated by applications in queueing theory, we consider a stochastic control problem whose state space is the $d$-dimensional positive orthant. The controlled process $Z$ evolves as a reflected Brownian motion whose covariance matrix is exogenously specified, as are its directions of reflection from the orthant's boundary surfaces. A system manager chooses a drift vector $\theta(t)$ at each time $t$ based on the history of $Z$, and the cost rate at time $t$ depends on both $Z(t)$ and $\theta(t)$. In our initial problem formulation, the objective is to minimize expected discounted cost over an infinite planning horizon, after which we treat the corresponding ergodic control problem. Extending earlier work by Han et al. (Proceedings of the National Academy of Sciences, 2018, 8505-8510), we develop and illustrate a simulation-based computational method that relies heavily on deep neural network technology. For test problems studied thus far, our method is accurate to within a fraction of one percent, and is computationally feasible in dimensions up to at least $d=30$.
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > United States > Georgia > Chatham County > Savannah (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
Taming under isoperimetry
Lytras, Iosif, Sabanis, Sotirios
In this article we propose a novel taming Langevin-based scheme called $\mathbf{sTULA}$ to sample from distributions with superlinearly growing log-gradient which also satisfy a Log-Sobolev inequality. We derive non-asymptotic convergence bounds in $KL$ and consequently total variation and Wasserstein-$2$ distance from the target measure. Non-asymptotic convergence guarantees are provided for the performance of the new algorithm as an optimizer. Finally, some theoretical results on isoperimertic inequalities for distributions with superlinearly growing gradients are provided. Key findings are a Log-Sobolev inequality with constant independent of the dimension, in the presence of a higher order regularization and a Poincare inequality with constant independent of temperature and dimension under a novel non-convex theoretical framework.
- Europe > Greece > Attica > Athens (0.04)
- Asia > Middle East > Jordan (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
- (2 more...)
Online Learning and Disambiguations of Partial Concept Classes
Cheung, Tsun-Ming, Hatami, Hamed, Hatami, Pooya, Hosseini, Kaave
In a recent article, Alon, Hanneke, Holzman, and Moran (FOCS '21) introduced a unifying framework to study the learnability of classes of partial concepts. One of the central questions studied in their work is whether the learnability of a partial concept class is always inherited from the learnability of some ``extension'' of it to a total concept class. They showed this is not the case for PAC learning but left the problem open for the stronger notion of online learnability. We resolve this problem by constructing a class of partial concepts that is online learnable, but no extension of it to a class of total concepts is online learnable (or even PAC learnable).
- North America > Canada > Quebec > Montreal (0.04)
- North America > United States > Ohio (0.04)
- North America > United States > New York (0.04)
- North America > United States > Indiana > Jackson County > Seymour (0.04)
Implicit Regularization with Polynomial Growth in Deep Tensor Factorization
Hariz, Kais, Kadri, Hachem, Ayache, Stéphane, Moakher, Maher, Artières, Thierry
Gunasekar et al. (2017) observed We study the implicit regularization effects of that for matrix factorization when there are no constraints on deep learning in tensor factorization. While implicit the rank, the solution of the optimization problem via gradient regularization in deep matrix and'shallow' descent turns out to be a low-rank matrix. Furthermore, tensor factorization via linear and certain type of they conjectured that, with small enough learning rate and non-linear neural networks promotes low-rank solutions initialization, gradient descent on full-dimensional matrix with at most quadratic growth, we show factorization converges to the solution with minimal nuclear that its effect in deep tensor factorization grows norm. Arora et al. (2019) and Razin & Cohen (2020) extended polynomially with the depth of the network. This the analysis to deep matrix factorization and showed provides a remarkably faithful description of the in this case that implicit regularization of gradient descent observed experimental behaviour. Using numerical cannot be formulated as a norm-minimization problem. By experiments, we demonstrate the benefits of studying the dynamics of gradient descent, they found theoretically this implicit regularization in yielding a more accurate and experimentally that it instead promotes sparsity estimation and better convergence properties. of the singular values of the learned matrix, indicating that implicit regularization in deep learning has to be studied from a dynamical point of view. Moreover, Razin et al. (2021) studied implicit regularization in'shallow' tensor
- Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
- Africa > Senegal > Kolda Region > Kolda (0.04)
- Africa > Middle East > Tunisia > Tunis Governorate > Tunis (0.04)
- North America > United States > Maryland > Baltimore (0.04)
Do Differentiable Simulators Give Better Policy Gradients?
Suh, H. J. Terry, Simchowitz, Max, Zhang, Kaiqing, Tedrake, Russ
Differentiable simulators promise faster computation time for reinforcement learning by replacing zeroth-order gradient estimates of a stochastic objective with an estimate based on first-order gradients. However, it is yet unclear what factors decide the performance of the two estimators on complex landscapes that involve long-horizon planning and control on physical systems, despite the crucial relevance of this question for the utility of differentiable simulators. We show that characteristics of certain physical systems, such as stiffness or discontinuities, may compromise the efficacy of the first-order estimator, and analyze this phenomenon through the lens of bias and variance. We additionally propose an $\alpha$-order gradient estimator, with $\alpha \in [0,1]$, which correctly utilizes exact gradients to combine the efficiency of first-order estimates with the robustness of zero-order methods. We demonstrate the pitfalls of traditional estimators and the advantages of the $\alpha$-order estimator on some numerical examples.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > New York (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)